MoSIFT: Recognizing Human Actions in Surveillance Videos
نویسندگان
چکیده
The goal of this paper is to build robust human action recognition for real world surveillance videos. Local spatio-temporal features around interest points provide compact but descriptive representations for video analysis and motion recognition. Current approaches tend to extend spatial descriptions by adding a temporal component for the appearance descriptor, which only implicitly captures motion information. We propose an algorithm called MoSIFT, which detects interest points and encodes not only their local appearance but also explicitly models local motion. The idea is to detect distinctive local features through local appearance and motion. We construct MoSIFT feature descriptors in the spirit of the well-known SIFT descriptors to be robust to small deformations through grid aggregation. We also introduce a bigram model to construct a correlation between local features to capture the more global structure of actions. The method advances the state of the art result on the KTH dataset to an accuracy of 95.8%. We also applied our approach to 100 hours of surveillance data as part of the TRECVID Event Detection task with very promising results on recognizing human actions in the real world surveillance videos.
منابع مشابه
MMM-TJU at TRECVID 2010
Surveillance Event Detection Semantic event detection in the huge amount of surveillance video in both retrospective and real-time styles is essential to a variety of higher-level applications in the public security. In TRECVID 2010, to overcome the limitations of the traditional human action analysis method with human detection/tracking and domain knowledge, we evaluate the general framework f...
متن کاملInformedia @ TRECVID 2009: Analyzing Video Motions
The Informedia team participated in the tasks of high-level feature extraction and event detection in surveillance video. This year, we especially put our focus on analyzing motions in videos. We developed a robust new descriptor called MoSIFT, which explicitly encodes appearance features together with motion information. For the high-level feature detection, we trained multi-modality classifie...
متن کاملLong Term Activity Analysis in Surveillance Video Archives
Surveillance video recording is becoming ubiquitous in daily life for public areas such as supermarkets, banks, and airports. The rate at which surveillance video is being generated has accelerated demand for machine understanding to enable better content-based search capabilities. Analyzing human activity is one of the key tasks to understand and search surveillance videos. In this thesis, we ...
متن کاملAnalyzing Video Motions
The Informedia team participated in the tasks of high-level feature extraction and event detection in surveillance video. This year, we especially put our focus on analyzing motions in videos. We developed a robust new descriptor called MoSIFT, which explicitly encodes appearance features together with motion information. For the high-level feature detection, we trained multi-modality classifie...
متن کاملDistant Human Interaction Recognition with Kinect
Detecting human interactions in a public place where no physical touch occurs has important applications in many surveillance tasks. In this paper, we explore the possibilities to automatically detect such distant human interactions without recognizing the specific human actions. Specifically, we use a highly simplified formulation of the interaction in this paper: 1) when a person does not int...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009